Finite genome length corrections for the mean fitness and gene probabilities in 

evolution models 



(N 

o 

O 

Q 

PL, 

d 



> 

(N 
(N 



Zara Kirakosyan 1 , David B. Saakian 1, 2 < 3 E and Chin-Kun Hu 2 ^ 

1 Yerevan Physics Institute, Alikhanian Brothers St. 2, Yerevan 375036, Armenia 
2 Institute of Physics, Academia Smica, Nankang, Taipei 11529, Taiwan 
3 National Center for Theoretical Sciences: Physics Division, 
National Taiwan University, Taipei 10617, Taiwan and 
4 Center for Nonlinear and Complex Systems and Department of Physics, 
Chung Yuan Christian University, Chungli 32023, Taiwan 
(Dated: December 7, 2012) 

Using the Hamilton- Jacobi equation approach to study genomes of length L, we obtain 1/L 
corrections for the steady state population distributions and mean fitness functions for horizontal 
gene transfer model, as well as for the diploid evolution model with general fitness landscapes. Our 
numerical solutions confirm the obtained analytic equations. Our method could be applied to the 
general case of nonlinear Markov models. 
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I. INTRODUCTION 

After the development of molecular biology, the con- 
cepts and methods in statistical physics have been widely 
used to study molecular models of biological evolution [l|- 
lH , especially in recent years [T^-|20l. |23 - I25j . Analytic 
solutions for models with the infinite genome length may 
be obtained via several methods, including the maximum 
principle [l2l . [l3j . Trotter-Suzuki method in quantum 
statistical physics [3, [H, [I?], HU , quantum field theory 
and the Hamiltonian- Jacobi equation method 
. The infinite genome length limit has been 
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solved fora more complicated (more nonlinear) evolution 



model: the horizontal gene transfer model [H|,[2(| ! 
This model has been solved both for the haploid 
and the hyper-geometric diploid case 041, [HI ■ 

The study of finite-size corrections is important not 
only for lattice phase transition models, but also for pop- 
ulation genetics (2(| [27| ■ Here there are situations with 
few relevant genes for the evolution problem. Although 
the genome length may be large, e.g. about 10000 in 
the case of HIV, it is possible to consider only the vari- 
able part of the genome which has only 40 — 100 bases 
j28| ; in such a case, the finite-size effects cannot be over- 
looked. The study of finite-size corrections allows us to 
understand as to what extent the results for the infinite 
genome length can be used to represent those for the fi- 
nite genome length. 

In recent works flfjl [22J the finite genome length correc- 
tions have been calculated for the haploid molecular evo- 
lution model [|[ . In [2|| we calculated the finite size cor- 
rection for the recombination model with single-peak fit- 
ness. In the present paper we calculate the finite genome 
length corrections for a diploid model with symmetric 
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landscape [23( as well as for a haploid model with a sim- 
ple horizontal gene transfer (HGT) [ill [2(| for a general 
symmetric fitness landscape. The method may be ap- 
plied to rather general cases of nonlinear probabilistic 
models 0. 

Actually, we are constructing a perturbation expan- 
sion for the eigen-value like variable for the nonlinear 
operators. In case of linear operators (like the quan- 
tum mechanics), to calculate the first order perturbation 
to the energy we need only the 0-th order expressions of 
eigen- vectors to calculate the leading corrections to eigen- 
values due to perturbations. In case of nonlinear system 
of equations it is impossible to use the methods of linear 
algebra. Nevertheless, we can succeed using a trick. To 
calculate the perturbation expansion of eigen- value like 
variables (mean fitness or the asymptotic expression), we 
use equations such as Eq.(13), where the contribution of 
correction terms of the distribution function disappear. 
Thus the perturbation of mean- fitness ("eigen-value" for 
a nonlinear problem ) can be calculated using only the 
zeroth-order expression of cigen-functions (bulk solutions 
for probability distribution). 

The present paper is organized as follows. In Section 

II we derive the finite size corrections for a haploid model 
(Crow-Kimura model) with a new method. In Sections 

III and IV the new method is applied to the diploid evo- 
lution model and HGT model, respectively. In the Ap- 
pendix we give finite size corrections for the diploid evo- 
lution model with single-peak fitness function. 



II. FINITE SIZE CORRECTIONS FOR THE 
CROW-KIMURA MODEL 

Let us first introduce a new method to calculate finite- 
size corrections for the Crow-Kimura model with haploid 
genotypes [1] , which is easier to study. The Crow-Kimura 
model is slightly easier to solve than the Eigen model 

aa. 



2 



In the Crow-Kimura model [4|, any genotype con- 
figuration Si = (s\, s l L ) is specified by a sequence 



of L two-valued spins SI 



±1 for L > k > 1 and 



M — 1 > i > such that the value +1 represents purines 
(A,G), and the value —1 represents pyrimidines (T,C), 
where M = 2 L is the total number of different sequences. 
The difference between two configuration Si and Sj is de- 
scribed by the Hamming distance dij = {L — ^ k s t k sV)/2. 
In other words, the Hamming distance is the number of 
different spins between configurations Si and Sj. 

The relative frequency Xi of a given configuration Si 
satisfies following equation: 



dxi 
~dt 



Xi{n 



M-l 

3=0 



M-l 



(i) 



here Ti is the replication rate, i.e. the fitness of an organ- 
ism with a given genotype and in Crow-Kimura's model it 
is specified like a function of the genotype: = Lf(Si). 
In other words fitness is the average number of offspring 
of individual with genotype sequence Sj per unit period 
of time and it is very meaningful quantity in evolution 
theory, /iy is the mutation rate to move from sequence 
Si to sequence Sj per unit period of time. It is important 
that in Crow-Kimura's model 3 only single base muta- 
tions are allowed, i. e. jiij = [i for dij = I, fiij = for 
dij =/= I and i ^ j, and jiij = —L/i for i = j; the last 
condition ensures that the time evolution of Xi does not 
change the normalization condition J^jio 1 00 j = 1- ^ n 
the following, we take /i = 1. 

The nonlinear system of Eq. ([I} can be mapped into 
following linear system of equations, 



, M-l 
dpi sr-^ 

= pm + 2^ /'-//'/• 

3=0 



(It 



where pi are related to a;,; of Eq. ([T]) via [30, l31j 



(2) 



(3) 



For the symmetric fitness landscape, where are the 
same for all the sequences with the same Hamming dis- 
tance from the same reference sequence So = (I,l,...,l), 
it is more convenient to work with classes, i.e. we classify 
configurations Si into the classes according to the value 
mi = mi, where m, is so-called "magnetization" of the 
configuration Si and is defined as to,; = J2k=i S k/L, ~ 1 — 
m % < 1. 

Defining the magnetization m; for the configurations 
at the l-th class as mi = 1 — 21/L and fitness r, = Lf(mi), 
we rewrite Eq. ([2|) for the probability for a typical con- 
figuration in the Z-class as 



dpi 
dt 



Pi(Lf(mi) -L) + lpi_i + (L- l)p 



l+i- 



(4) 



We define the l-th Hamming class as the group of all 
sequences at the Hamming distance I from the reference 



sequences. Having the number of configurations at the 
i-th class Li = jjnTZTji f° r Pi = ^iPi we have 

dP, 



— - = fl(L/(m,)-i) + (Ii-I + l)fl-i 
dr 



(5) 



We are interested in calculating the mean fitness R = 
J^,, Pi f(mi) and surplus s = J^i Pi m i m the steady state. 
In [22| , one of us proposed that in the steady state In Pi 
and the mean fitness R can be written as 



InP; = LU(m,t) + 0{l), 
R = Lk + 0(1), 



(G) 



where m = 1 — 21/ L. Then, high order corrections for R, 
and for In Pi have been derived using the linear algebra 
methods and the equations for Pi/y/Li. In this section 
we derive these corrections with an alternative method 
that can be applied to the strong nonlinear situations. 

Using the ansatz Pi = exp[LU(m,t)] and a formula 
P l±1 « P /e -( ±2l O where U' = ne can trans- 

form Eq. ([5]) into the Hamilton- Jacobi equation, 



dU 
~dt 

H(m,y) 



H(m,U'), 
f(m) - 1 



=22/ 



-2y 



(7) 



where y = [/' is a dummy variable. We assume an asymp- 
totic 



U(m, t) = kt + u(m), 
and get an ordinary differential equation 

fc = /(m)-l + i±^e 2 "' + 



(8) 



1 — m 



-2u' 



2u' = In 



fc — 1 + /(m) ± y/(k- 1 + f(m)) 2 -l + nv 
1 + m 



,(9) 



which corresponds to Eq. (23) in Ref. [22j. We take the 
"+" solution for — 1 < m < mo and " — " solution for 
m > mo. 

In Eq. ©, k is a function v! . The value of u' , which 
gives the maximum value of k is defined by equation 



u o( m ) = f 



m 



(10) 



Thus the maximum value of k can be obtained from 



[V(m)} 



-Km<l, 



V(m) = H(m, uo(m)) = f(m) - 1 + v 7 ! - m 2 ,(ll) 

where the maximum of the first equation is at the point 
to - 

To calculate higher order corrections, we write InP; as 
InP; = L(k + -±)t + Lu{m) + Ul (m) + 0(1). (12) 
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Equations © and (QU imply that [22 



fei , . 1 + m o..' 1— m o,,' 

* + -j- = /(m) - 1 + — J— e + — — e 



1, o,,' 2u",l + m a,,/ . 1-m _ 2u / 



a« L e -a») + rr_(; 



2< 1 + m 2u , 1-m _ 2u , 



: e - zu ) 

(13) 



In Eq. lfTB")) . k is the bulk expression of the mean fitness 
and ki/L is the first order correction to it. Having the 
value of k\ , we can calculate . ki can be defined from 
Eq. (|13[) at the point m = m 0l where the coefficient of v! x 
becomes zero. We have an equation 



,2«'(mo) _ , P-- m 



1 + m Q 

Equation (|13|) then implies that at m 



(14) 



k x = e 2u ' + er 2u ' 



. 1 + m 



1 — m 



= 2 J_ g + 2^(mo)^/l-mg. (15) 

Let us define u"(mo). Equation Q can be written as 
k = H(m,p = u'). Expanding Eq. ^ near m = uiq 
with respect to the first and the second arguments of 
H(m,p = it') up to the second order, we derive 



Ps V"(mo) 



(m - m ) 5 



+ H pp {m ,u {m )) . (16) 

Note that there is no (m — m )(it' — u' ) term in the last 
equation. We can verify this directly: H p (m,pQ) = 0. 
From Eqs. 0, CO} and JTU]), we derive 



V"(m ) = /"(m ) - 



(1-m 2 ,) 3 / 2 ' 



u'o(m) w -(m - mo); 



1 



+ w ( TO o)- 



'2(l-mg) 

With these expressions in Eq. p^j) . we derive 
/"(mo) = - ^^(2."(m„) + 



Therefore 



2u"(m ) = --j— 



1 



{I- ml) 
1 

(1 -mg)V4 



'/" 



(!_ m 2)3/2' 



(17) 
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FIG. 1: Comparison of the 0-th order and 1/L accuracy 
expressions for the haploid model class probabilities with the 
numerics given by Eq.(4) for the fitness function /(m) = |m 2 . 
p n are the results of numerics for the original model given 
by the system of equations, pon = exp(Lu(m)) is calcu- 
lated using O(L) terms in expression of lnp„, Eq.(12), while 
pin = exp(Lu(m) + ui(m)) corresponds to the O(l) accuracy 
in expression of lnp n , Eq. (12). The smooth line corresponds 
to S — lnpon/Pn- For Fig l.a L = 80 and the dashed line 
corresponds to S — lnpi„/p n . For Fig l.b L = 160 and the 
dashed line corresponds to 5 = 101npi n /p n . 



Equations (|T0|) and (|T4|) imply that u'(mo) = u' n (mn 
Thus eventually, we can use Eq. (fT5")) to obtain [19| , [22 



ki = 



1 



VT 



/"W(l-"^) 1/4 



(l-m§)3/a 

=^[1- v/l-(l-mg)3/2/"( mo )]. (18) 
1 — mi v 



Using the expression for k\ , we can calculate the u\ from 
Eq. (S): 

, 1 + m 2 „, 1-m 2u / 



2 

2«' 



2«i 

fei - (e 



1 — m 



(19) 



Therefore, having u l5 Eq.(fT9"|). we can calculate the 1/L 
accuracy expression for the haploid class probabilities Pi 
(we define them as p\i = (Lu(m) + ui(m))), also we can 



calculate the probabilities poz 



D Lu(m) 



Eq.(H2D. The 



comparison of this computations with numerical results 
are shown in the Fig. 1 for the case L = 80, 160. 

On the other hand, we can calculate k\ directly as 
/(m,)P; - k, 



ki = 



1 



u"(s)\ 



(f'(sH(s) + ^P-). (20) 



The bulk surplus term is calculated using the equation 


f(s) = R. 

From the definition of s, the correction to the surplus can 
be written as 8s = jrijjjrnx] ■ Comparing with Eq. ([20]) . 
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we get Ss = (ki 



2f9^)/(L/'( S )). Using the formula 
/'(*) 



u"(s) = 



2s ' 



we derive, assuming f'(s) > 0: 



Ss = fcl ~ fit \ > T fit Y 



(21) 



(22) 



One can define ui(m) integrating the expression of 
u\ (m) from Eq. (|T9| . Thus we have derived the results 
of [22j by an alternative method. 



III. FIRST ORDER CORRECTIONS FOR THE 
DIPLOID CASE 

The diploid model has been defined by Eq. (1) in j23j . 
There we derived the analytic solution for the diploid 
many allele biological evolution models 041] with general 
fitness landscapes to the first order using the Hamilton- 
Jacobi equation approach. In the present paper we will 
find 1/L accuracy for mean fitness and genome probabil- 
ities of that diploid model. 

In WBS parallel diploid model gene probabilities 
Pi evolve as 



M 



M M 



= Pi E ~ E E A JkPjPk 

3=1 fc=l 7=1 



M 

+E^ 

3=1 



lijPj . 



Here rriij is a mutation matrix, defined in Section II. We 

have a balance condition Yli^iPi = 1- Aij i s the fitness 
of the genotype (Si,Sj), and ^2jAijPj is the marginal 
fitness for the sequence 5*. 

We assume that the fitness of the configuration Si is 
a smooth function of the Hamming distance between Si 
and the reference configuration S±. In such case, it is 
convenient to work with the overlap m = (1 — 2dn/L) 
instead of the Hamming distance da. Consider the fol- 
lowing choice of the matrix A: 



f(m,m), 



Ai 



Aij = f(m 1 ,m 2 ), 



(24) 



where mi = 1 — 2du/L,ni2 = 1 — 2dj\/L. In the hy- 
pergeometric model L(l + mi)/2 and L(l + m2)/2 are, 
respectively, the number of Ai maternal and paternal al- 
leles. / (1711,7712) is a smooth analytical function. We are 
interested in finding the exact phase structure and the 
steady state, therefore we can consider only symmetric 
solutions of pi . 

Using the expression Pi ~ exp[Rt + . . . ] in Eq. (13) of 
[lU , we obtain coupled equations for Pi that describe the 
diploid, one locus many allele parallel mutation-selection 
model lUSi]: 

RPi = LP1F1 + (L - I + l)P*_i + (I + l)P l+1 

- LPi(l + Y,F k P k ). (25) 



Here we define the marginal fitness for the Hamming class 
Fi = J2n=of( l - 2Z/-M - 2n/L)P n . In diploid case 
the fitness landscape is defined via a function / (mi, 7712) 
of two arguments, describing the dominance relations 
among alleles 7 9]. Eq.(21) coincides with the Crow- 
Kimura model where Fi is the fitness function for the 
Hamming class 1. 

Assuming ansatz © and ©, we derive the bulk ex- 
pression for steady state solution as 

k = f(m, s) - 1 + i±ZV"> 1 ) + i^ e - 2 «'M, (26) 

where u(m) is defined via Eqs.(6),(8). The mean fitness 
per spin k and surplus s = J2i PiO- ~ 21/ L) are defined 
via a system [23| 



f m (m ,s) 



f(s,s), 
m 



(27) 



To find an expression for Fi with 1/L accuracy, we 
calculate the following integral: 



Ft 



f{m 1 x)e Lu ^ )+u ^ x) dx, 



(28) 



(23) where m = 1 — 21/ L. Therefore we derive 



Fi-f(m,s) 



tfCm, *)[«!(*)] , f&[m,8] 



/b(77l,S 



L\u"(s)\ 2L\u"(s) 
A f^[ m ,s] 



2L|u"(s)| ; 



(29) 



where A — [u' 1 (s)]/|m"(s)|, s is the surplus, /^(m, s) is 
the first derivative of the fitness function via the second 
argument, and f^ b (m,s) is the second derivative of the 
fitness function via the second argument. 

For the mean fitness R/L = k + from Eq. ([29j we 
have 



R 
L 



E F ' P <= / /K s )e L " 



(m)+u 1 (m) dm 



/h(777,S 



A , f&[m,s] 



L 2\u"(s)\ 



(30) 



Expanding f(m, s) via the first argument in the equation 
above, we obtain 



h = iti(s,s) + f a (s,s)) 



A 

L 2L\u"(s)\ 



(31) 



with /a( m i s ) being the first derivative of the fitness func- 
tion via the first argument and f' aa {m, s) being the second 
derivative of the fitness function via the first argument. 
On the other hand, we can get an expression for k\ by 



adding the term f' b {m, s)4 + 



fbbl m - s ] 



to the right-hand 
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side of Eq. ([T9l) . Thus we obtain the following equation 
for the corrections: 



[f^s,s) + f a (s,s)]A + 



m*, *]+&[*>»] 



2\u"(s)\ 



+ fi(m,s)A- 



2\u"(s)\ 
„i 1 — m 



-2u'l 



2cosh(2u')-(32) 



The first line is derived via direct integration of the fitness 
function via the steady state distribution, the u[ term 
in the second line is just the small u[ correction of the 
right hand side of Eq. (|26|) . The third line corresponds 
to Eq. (|29| . The last line corresponds to the correction 
terms of the haploid model. 

To calculate fci we again consider the point where the 
coefficient of u[ disappears in the last equation. We cal- 
culate u"(mo) as in the previous section. We have an 
equation for A 

W(s,s) + f' a (s ) s)-fi(m ,s}A 

/&(™o,s) -/&(«,«)] 

2|«"(*)| 



f^mo,s)-f^(s, 3 )-f^a,a)] 
2\u"(s)\ 



(33) 



where mo is the bulk magnetization. Then putting the 
value of A in Eq. (|33|) . we get an expression for k\\ 



L 


100 


100 


100 


150 


150 


150 


a 


4.5 


5.5 


6 


4.5 


5.5 


6 


kji 


3.18506 


4.15619 


4.64491 


3.18455 


4.15574 


4.64449 


S'o 


0.00152 


0.00132 


0.00124 


0.001014 


0.000883 


0.000829 


Si 


0.00001 


0.00001 


0.00001 


0.000005 


0.000005 


0.000005 



TABLE I: Comparison of 0-th order accuracy expression Oq = 
k„ — ko and the 1/L accuracy expression S[ = k n — k results 
for the mean fitness of diploid model, for the fitness function 
/o(mi,m2) = §(tii + mV) + 6mim2,for b = 0.5. k n is given 
by direct numerics of Eq.(4). 




FIG. 2: Comparison of 0-th order accuracy expression and 
the 1/L accuracy expression for the p„ in diploid model, for 
the fitness function /o(mi, 7712) = § (m\ + m%) + bm\m2 and 
a — 4.5, b = 0.5. p„ are the results of numerics for the of 
the original model given by the system of equations, pon = 
exp(Lu(m)) is calculated using O(L) terms in expression of 
lnp n , Eq.(12), while pi„ = exp(Lit(m) + ui(m)) corresponds 
to the O(l) accuracy in expression of lnp n , Eq. (12). 5 = 
Inpon/pn, the smooth line. For Fig 2. a L — 50 S — \npi n /p n 
is the dashed line. For Fig 2.b L = 100, and S = 101np2n/Pn 
is the dashed line. 



_ /£,(«,«) + /&(«,«) 
2\u"(s)\ 

f>(s,s) + f! 1 (s,s)-f>(mo,s) 

JbMo,-s)-^ a ( S , S )-n' b (s,s) 

1 2\u"(s)\ 



+ k lh ], (34) 



Having ui by Eq. ([32)) . we calculate the 1/L accu- 
racy expression for the diploid class probabilities p2n = 
exp(Lit(?7i) + ui(m)). In Fig. 2 we give the comparison 
of our analytical results with the direct numerics. 



kih 



where 



[1- ^l-f!U(mo,s)(l-ml)^] 



u"{s) 



f'a(s,s) 

2s 



(35) 



From Eq. ([2T|) we calculate first order accuracy expres- 
sion for the diploid mean fitness k t heor = and then 
from Eq. ([M|) we compute the 1/L accuracy expression 
for mean fitness k = ko + ki/L. Having all these expres- 
sions, we compare our analytical results with the direct 
numerics for different values of L and different values of 
a in Table I. 



IV. HORIZONTAL GENE TRANSFER MODEL 



The finite genome length effects are very important 
for the case of horizontal gene transfer(HGT). While the 
genome length could be large (about 10000 in case of 
HIV), it is possible to consider only the variable part 
of g enome, L = 40 - 100 [H]). We consider the model 
[l6l. [20I] describing a simple horizontal gene transfer. The 
model has been solved 2C| in the large genome limit in 
[20I ] (mean fitness) and 



23| (steady state distribution). 



In this model, besides the mutation process, one spin in 
the genome is replaced with the spin at the same position 
from the sequence pool. 
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We consider the following system of equations [lg, l2C§ : 
dPi 



dt 

m 



= Pm + [(L — l + l)Pi-i + + l)P;+i] - 

1 X - 



cPi + c( 



L z -' 

k 

I I 



LL 



r k Pk) 



L-IL-L 
~~L T 



)PI + 



l-lj-l l + l , l + l. „ , 



L 



L 



L 



where I = J^i-FJ/ = (1 — s)L/2. The first and second 
lines correspond to the Crow-Kimura model Q , [l2| • The 
third line describes the HGT process with the change of 
the Hamming class. We cut plus spin with the probabil- 
ity (1 — l/L) and add plus spin from the sequence pool 
with the probability (1 — l/L), then cut minus spin with 
probability l/L and add minus spin from the sequence 
pool with the probability l/L. The last line corresponds 
to the process with the change of Hamming class. We cut 
the plus spin with the probability (l—l)/L and add minus 
spin from sequence pool with the probability 1 — (I— 1)/L, 
then cut the minus spin with the probability (I + 1)/L 
and add plus spin from sequence pool with the plus spin 
with the probability 1 — (I + 1)/L. 
Considering the ansatz 

fci 

Pi = cxp[L(k + — )t + Lu(m) + ui(m)}, 
±j 

we get the following equation for the bulk terms 
k = H(rn, v!) 

H(m,p) = f(m) H — e 2p [l 4 



l — to 



2 

cms c 



(37) 



where r; = Lf(m) and p is a dummy variable. The min- 
imum of H(m,u') is at u' — u (m), where 



M (l-m)(l + c^) 



(1 + to)(1 + c±^) 
gives a potential V(m, s) = min[H{m, u')] 

V(m, s) = f{m) + V(1-™ 2 )C + 



(38) 



cms c ^ 
~2 2 ~ ' 



c 2 s 2 , 



a = [(i + -) 2 -— , 

We define to , s from the conditions 

V'(m ,s)=0, f(s)=V(m ,s), 



(39) 



(40) 



where the derivative is with respect to the first argument. 
Let us calculate the u" [mo). Near mo we have an expan- 
sion 







V"(m ,s) 



(m - mo) 2 



+ H"(m ,u' ) 



, (u'(m) - u (m)f 



(41) 



where V" is the second derivative of V with respect to 
the first argument, H' pp is the second partial derivative of 
H' with respect to the second argument of the function 
H(m,p). Dividing Eq. (|4"Tj) by (to — too) 2 , we derive the 
equation 



f"(™> ) 



C 



(1 - TO 2 ) 3/2 



y/(\-m?)C{u" - u' r f (42) 



or 



2u"(m ) 



f" 4- 

■I " r (l-mg)3/ 2 



(I -ml) \{l-ml)Cy/* 
From Eq.(36) we have: 

iWi(*) , f"(s) 



(43) 



h = 



2m" (s) 



2v! x \e' 



> 1 + TO 



(1 



1 - s, 



1 — m /-, 1 + S^ _ 2 , /. 



[«"-'(! + c i_!) + (l + c l + £)e-*.'] 



+ 2u"[e 

1 — TO 



2u' 1 + m 

2 v ~ ' ~ 2 



1 s . 
0- + C-Z-) 



(l + c i±^)e- 2 "'] 



cu[(s) 



m - e 2«' i±m +e -2«' l=m 
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The first line is a direct integration of the fitness via 
steady state distribution, the second line corresponds to 
the correction of F[m,p) via dp = u^, the third line cor- 
responds to the corrections in the first line of Eq. (f3l))) . 
The last line corresponds to the Ss corrections from the 
/terms in Eq. ([36| . 

To calculate k\, we put the optimal value of u' and 
look at the point to = too- Then Eq.(44) is simplified: 

f'(s)u[(s) , f"(s) 



u" 
2 



yl — m, 
+cu' 1 (s) 





toq 
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2VC 



2| U "(a)| 
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2|«"(*)| 



/"(l - TO 2 ) 3 / 2 



Deriving u'^s) from the last equation, we obtain 

/'(*M(«) , /"(«) 
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(45) 
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Thus 



fci 



f"(s)s 
/'(*) 



f(s) 



/'(*) - c- 



(47) 
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Vc 



/"(m )(l-m2)3/2 /"( s ) 
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For the fitness function /(m) = cm 2 with parame- 
ters c = 1, we derive from the numerics for the mean 
fitness and order parameters R/L = 0.218142, mo = 
0.765984, s = 0.467057. The analytical formula Eq.ffl 
for the infinite genome limit gives k = 0.21842, while 
the numerics for L = 100 gives R/L = 0.21973 and for 
L = 200, R/L = 0.218946. Thus we get for L = 100, 
R/L - k = 0.00158, R/L - k - ki/L = 0.00002 and for 
L = 200, R/L-k = 0.000803, R/L-k-kl/L = 0.000003. 
We see that our analytical result by Eq. (|4"T|) for the mean 
fitness expression is well confirmed. 

One can define Ui (m) by integrating the expression of 
wi(m) from Eq. (|45|) . 



V. DISCUSSION 

We calculated the finite genome size corrections to the 
mean fitness and steady state distribution for strongly 
nonlinear evolution models: horizontal gene transfer 
model and diploid parallel evolution model for general 
fitness function. The application of the Hamilton- Jacobi 
equation for the investigation of the master equation, es- 
pecially the finite size corrections, is a rather popular 
method in case of linear master equation for chemistry 
or ecology [34|,[35|. For the models, considered in our 
article, the standard methods of linear algebra and the 
methods of [HI,!! fail - 

The key point of our method is that we investigate the 
point where the coefficient of the correction to the steady 
state distribution disappears. Then the correction to the 
fitness is calculated from the smoothness condition. Our 
method could be applied to other nonlinear probabilistic 
models as well. 

Our formulas could be applied for relatively small 
genome lengths, for the single peak fitness landscape even 
for L ~ 4, see Appendix A. In population genetics usually 
few allele models are investigated with L = 2,3. Fortu- 
nately, for the larger allele numbers we can apply our 
methods. Our formulas, while a bit involved, are less 
cumbersome than the formulas for several allele cases in 
population genetics. 

DBS thanks DARPA Prophecy Program and 
Acadcmia Sinica for financial support. 



Appendix A. Finite L corrections for a parallel 
diploid model with the single-peak fitness function 

Consider the following diploid model studied in [23| : 

dp 2L 2L 

-rr = Pi E Ai i p i ~ H E A 3lPm\ 

+ ^mijPj (A.l) 
j 

with the fitness coefficients 



An = 2s; A u = An = 2sh, for i ^ 1; 



A, 



0, for i > 1, j > 1. 



(A.2) 



Here rriij is the mutation matrix, ma = — 1, my = 1/L 



for dij = 1, and for other cases m, 



0. 



For the steady state we have [8, 23] 

RPi = !'■ X] ' X] '"'.'''i- 



3 

2 L 2 r 



R 



J2 J2 A oipjpi 

2s[(l - 2h)x 2 + 2hx] 



(A.3) 



Here x takes one of the solution x± defined below 
-b±Vb 2 - Aac 1 



x± 



2a 



-,Pi = x 



h-h 



(A.4) 



where a = 1 - 2h, b = 3h - 1, and c = l/(2s) - h. We 
have for the marginal fitness fi, $2 and the fitness R 

fi = 2sx + 2sh(l - x), f 2 = 2shx 
R = xfi + (1 - x)f 2 = 2s[(l - 2h)x 2 + 2hx]. (A.5) 

In terms of the marginal fitness, the original equation 
(Al) can be written in the form 



^ = (r, - R)Pi + (L - I + 1)P,_ X 
dt 

+ (l + l)Pi +1 -LP h 



(A.6) 



where P/ are the class probabilities of the configurations 
at the Hamming distance I from the reference sequence, 
r = Lfi, and n = Lf 2 , 1 > 1. 

Let us denote x = Xq + S, R = R a + k\/L. Then 

kx = 2s[(l-2h)x + 2h}6, 

/i = 2sx + 2sh{\ - x) + 2s(l - h)5. (A.7) 



For the p\ , we have the equation 



x(R(x) + l- h(x)) 



■ Pi- 



(A. 
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Replacing on the right hand side and before the parcn- and for ki = R'(x) * S 
thesis on the left hand side 

x ~ (1 - l/(/i - / 2 )) 
and ki 



i(/i-/a)(l-^) 

Pi^(/i-/ 2 -l)/(/i-/ 2 ) 2 , 1 

— T- ( A -!0) 



r~i/(/i-/ 2 -i), 

wc find a perturbation expression for <5 



- /2)[1 - 2 [(l-2h)x+/ t ]] 



j ^ , For s = 5.3, h = 0.1, L = 4 we get R n — 8.55, while 

+ °(t?) ( A - 9 ) i?o + = 8.49, thus we have 1% accuracy. 
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